PSCI 3300.003 Political Science Research Methods
A. Jordan Nafa
University of North Texas
September 27th, 2022
Formal and informal representations of theories of politics
Connecting research question, data, and theory gives us substance
Directed Acyclic Graphs (Greenland, Pearl, and Robins 1999; VanderWeele and Robins 2007)
From concept to measurement
Problems of conceptual stretching
Measurement validity
\[ \definecolor{treat}{RGB}{27,208,213} \definecolor{outcome}{RGB}{98,252,107} \definecolor{baseconf}{RGB}{244,199,58} \definecolor{covariates}{RGB}{178,26,1} \definecolor{index}{RGB}{37,236,167} \definecolor{timeid}{RGB}{244,101,22} \definecolor{mu}{RGB}{71,119,239} \definecolor{sigma}{RGB}{219,58,7} \newcommand{normalcolor}{\color{white}} \newcommand{treat}[1]{\color{treat} #1 \normalcolor} \newcommand{resp}[1]{\color{outcome} #1 \normalcolor} \newcommand{sample}[1]{\color{baseconf} #1 \normalcolor} \newcommand{covar}[1]{\color{covariates} #1 \normalcolor} \newcommand{obs}[1]{\color{index} #1 \normalcolor} \newcommand{tim}[1]{\color{timeid} #1 \normalcolor} \newcommand{mean}[1]{\color{mu} #1 \normalcolor} \newcommand{vari}[1]{\color{sigma} #1 \normalcolor} \]
Potential Outcomes: A way of formally expressing each of the possible outcomes unit \(i\) could experience under a given treatment regime
Causal Effect: The change we would expect to observe if we altered some feature of the world compared to what would have happened in the absence of such a manipulation
Estimand: The causal effect we are trying to estimate. The estimand connect research question, theory, and statistics.
Directed Acyclic Graphs (DAGs): Non-parametric graphical representations of causal relationships.
Experimental
Observational
You don’t have control over which units get assigned the treatment and thus cannot rely on random assignment to identify causal effects
This makes causal inference difficult, but by no means impossible, in many areas of political science and most of international relations
Common misconception that experiments can “prove” causality
Directed Acyclic Graphs (DAGs)
Directed
Acyclic
Graph
DAGs are a type of non-parametric graphical causal model
Graphical model of the data generation process (DGP)
Maps your theory of the process to a model
Fancy math called “do-calculus” tells you what to adjust for to isolate and identify causation
Makes our assumptions about the causal process explicit
What if there’s something that really is cyclical?
\(\mathrm{Democracy} \longrightarrow \mathrm{Development} \longrightarrow \mathrm{Democracy}\)
This isn’t acyclic!
\(\mathrm{Democracy} \longleftrightarrow \mathrm{Development}\)
Remember the causal ordering assumption we talked about last time
If a process is truly cyclical, we can represent that using a time index
Imagine we are interested in whether the adoption of gender quotas in has a causal effect on the level of female representation in a country’s government.
Our first step is to make a list of the theoretically relevant variables.
Gender quota is the treatment and female representation is the outcome of interest
What other factors could influence Quota adoption, female representation in government, or both?
Political corruption (Esarey and Schwindt-Bayer 2019; Stockemer 2011)
Electoral institutions (Paxton, Hughes, and Green 2006; Schwindt-Bayer 2009; Tripp and Kang 2008)
Women’s civil society engagement (Hughes, Krook, and Paxton 2015; Irvine 2007)
Attitudes towards women in society and politics (Alexander 2012)
Other country-specific or time varying features of the world that are more difficult to observe (Paxton and Hughes 2015; Paxton, Hughes, and Painter 2010)?
Gender Quotas \(\treat{X}_{\tim{t}}\) cause an increase in Female Representation \(\resp{Y}_{\tim{t}}\)
Political Corruption \(\covar{Z}_{\tim{t}}\) has a causal effect on Female Representation and Quota Adoption
Gender Quotas \(\treat{X}_{\tim{t}}\) cause an increase in Female Representation \(\resp{Y}_{\tim{t}}\)
Political Corruption \(\covar{Z}_{\tim{t}}\) has a causal effect on Female Representation and Quota Adoption
Electoral institutions \(\covar{V}_{\tim{t}}\) have an effect on Female Representation and Quota Adoption
Gender Quotas \(\treat{X}_{\tim{t}}\) cause an increase in Female Representation \(\resp{Y}_{\tim{t}}\)
Political Corruption \(\covar{Z}_{\tim{t}}\) has a causal effect on Female Representation and Quota Adoption
Electoral institutions \(\covar{V}_{\tim{t}}\) have an effect on Female Representation and Quota Adoption
Women’s civil society organizations \(\covar{W}_{\tim{t}}\) have an effect on Female Representation and Quota Adoption
Gender Quotas \(\treat{X}_{\tim{t}}\) cause an increase in Female Representation \(\resp{Y}_{\tim{t}}\)
Political Corruption \(\covar{Z}_{\tim{t}}\) has a causal effect on Female Representation and Quota Adoption
Electoral institutions \(\covar{V}_{\tim{t}}\) have an effect on Female Representation and Quota Adoption
Women’s civil society organizations \(\covar{W}_{\tim{t}}\) have an effect on Female Representation and Quota Adoption
Attitudes towards women in politics \(\covar{U}_{\tim{t}}\) are affected by Quota Adoption and have an effect on Female Representation
Gender Quotas \(\treat{X}_{\tim{t}}\) cause an increase in Female Representation \(\resp{Y}_{\tim{t}}\)
Political Corruption \(\covar{Z}_{\tim{t}}\) has a causal effect on Female Representation and Quota Adoption
Electoral institutions \(\covar{V}_{\tim{t}}\) have an effect on Female Representation and Quota Adoption
Women’s civil society organizations \(\covar{W}_{\tim{t}}\) have an effect on Female Representation and Quota Adoption
Attitudes towards women in politics \(\covar{U}_{\tim{t}}\) are affected by Quota Adoption and have an effect on Female Representation
Women’s civil society organizations \(\covar{W}_{\tim{t}}\) have an effect on attitudes towards women in politics \(\covar{U}_{\tim{t}}\)
Gender Quotas \(\treat{X}_{\tim{t}}\) cause an increase in Female Representation \(\resp{Y}_{\tim{t}}\)
Political Corruption \(\covar{Z}_{\tim{t}}\) has a causal effect on Female Representation and Quota Adoption
Electoral institutions \(\covar{V}_{\tim{t}}\) have an effect on Female Representation and Quota Adoption
Women’s civil society organizations \(\covar{W}_{\tim{t}}\) have an effect on Female Representation and Quota Adoption
Attitudes towards women in politics \(\covar{U}_{\tim{t}}\) are affected by Quota Adoption and have an effect on Female Representation
Women’s civil society organizations \(\covar{W}_{\tim{t}}\) have an effect on attitudes towards women in politics \(\covar{U}_{\tim{t}}\)
Time invariant differences \(\upsilon\)?
We can then express each node as function of those that influence it
\(\mathrm{\resp{Female~Representation}}_{\tim{t}} = f(\treat{X}_{\tim{t}}, \covar{Z}_{\tim{t}}, \covar{V}_{\tim{t}}, \covar{W}_{\tim{t}}, \covar{U}_{\tim{t}}, \upsilon)\)
\(\mathrm{\treat{Gender~Quota}}_{\tim{t}} = f(\covar{V}_{\tim{t}}, \covar{W}_{\tim{t}}, \covar{U}_{\tim{t}}, \covar{Z}_{\tim{t}})\)
\(\mathrm{\covar{Political~Corruption}}_{\tim{t}} = f(\covar{V}_{\tim{t}}, \upsilon, \dots)\)
\(\mathrm{\covar{Electoral~Institutions}}_{\tim{t}} = f(\upsilon, \dots)\)
\(\mathrm{\covar{Womens~CSOs}}_{\tim{t}} = f(\upsilon, \dots)\)
\(\mathrm{\covar{Gender~Attitudes}}_{\tim{t}} = f(\treat{X}_{\tim{t}}, \covar{W}_{\tim{t}}, \upsilon, \dots)\)
Which of these do we need to measure to identify the path \(\mathrm{\treat{Gender~Quota}}_{\tim{t}} \longrightarrow \mathrm{\resp{Female~Representation}}_{\tim{t}}\)?
Using theory and existing research, we’ve identified possible relationships between concepts (nodes)
We only care about \(\treat{X}_{\tim{t}} \longrightarrow \resp{Y}_{\tim{t}}\), but what do we do about all the other nodes?
A causal effect is identified if the association between treatment and outcome is properly stripped and isolated
Arrows in a DAG transmit associations between nodes
We need to redirect and control these paths by adjusting or conditioning in order to isolate the relationship we are interested in
Without additional often strong assumptions, it is only possible to adjust for things we can observe
\(\treat{X}\) causes \(\resp{Y}\)
But \(\covar{Z}\) causes both \(\treat{X}\) and \(\resp{Y}\)
\(\covar{Z}\) confounds the \(\treat{X} \longrightarrow \resp{Y}\) association
Failing to adjust for the confounder \(\covar{Z}\) results in omitted variable bias
The causal path \(\treat{X} \longleftarrow \covar{Z} \longrightarrow \resp{Y}\) is called a backdoor
\(\treat{X}\) and \(\resp{Y}\) are d-connected because associations can pass through a third variable, \(\covar{Z}\)
The relationship between \(\treat{X}\) and \(\resp{Y}\) is not causally identified unless we close the backdoor created by \(\covar{Z}\)
To do this, we need to somehow adjust for \(\covar{Z}\)
Most common method of adjustment is to include \(\covar{Z}\) in a regression model
Returning to our earlier example, what do we need to adjust for to identify the causal path \(\treat{X}_{\tim{t}} \longrightarrow \resp{Y}_{\tim{t}}\) under the assumed data generation process?
Answer is fairly straightforward in this example, but for illustrative purposes we’ll use the dagitty and ggdag packages to figure it out
Our first step is to define the DAG for our theoretical model
# Define the DAG for the causal relationship
quota_dag <- dagify(
Y ~ X + Z + V + W + U + C,
Z ~ V + C,
V ~ C,
X ~ V + W + Z,
U ~ X + W + C,
W ~ C,
labels = list(X = "X[t]", Y = "Y[t]", Z = "Z[t]", V = "V[t]",
W = "W[t]", U = "U[t]", C = "upsilon"),
coords = list(
x = c(X = 0, Y = 1, Z = 1, V = 0, W = 0, U = 1, C = 0.5),
y = c(X = 0, Y = 0, Z = 0.5, V = 0.5, W = -0.5, U = -0.5, C = 1)
),
outcome = "Y",
exposure = "X",
latent = "C"
) %>%
# Convert the DAG into a tibble
tidy_dagitty() %>%
# Set node status
node_status() Next we can use the adjustmentSets from the dagitty package to get the nodes we need to adjust for to identify the path \(\treat{X}_{\tim{t}} \longrightarrow \resp{Y}_{\tim{t}}\)
# Get the minimum adjustment set
quota_dag$dag %>%
adjustmentSets(){ V, W, Z }
Next we can use the adjustmentSets from the dagitty package to get the nodes we need to adjust for to identify the path \(\treat{X}_{\tim{t}} \longrightarrow \resp{Y}_{\tim{t}}\)
# Get the minimum adjustment set
quota_dag$dag %>%
adjustmentSets(){ V, W, Z }
Then we use adjust_for from ggdag to adjust for the variables in the output and use mutate to wrangle some new columns for color and transparency
# Adjust for the variables in the output
quota_dag_adjusted <- quota_dag %>%
adjust_for(var = c("V", "W", "Z")) %>%
# Create a columns color coding the nodes and edges
mutate(status_color = case_when(
is.na(status) | status == "latent" ~ as.character(adjusted),
TRUE ~ as.character(status)
)) %>%
# Adjust the length of the arrows slightly
shorten_dag_arrows(., proportion = 0.08)# Generate the DAG for the contemporaneous effect of X on Y
ggplot(quota_dag_adjusted, aes(x = x, y = y, xend = xend, yend = yend)) +
# Add the graph edges
geom_dag_edges(
aes(x = xstart, y = ystart, edge_alpha = if_else(status_color == "adjusted", 0, 1)),
edge_width = 1.5,
edge_color = "white",
arrow_directed = grid::arrow(length = grid::unit(10, "pt"), type = "closed"),
show.legend = FALSE
) +
# Add the graph nodes
geom_dag_node(alpha = 0) +
# Add the graph text
geom_dag_text(
aes(label = label, color = status_color),
parse = TRUE,
size = 22,
family = "serif",
show.legend = FALSE
) +
# Apply the theme settings for the slides
dag_theme(.base_size = 24) +
# Set the color scale for the text
scale_color_manual(values = c("#B21A01", "#00F2FF", "#62FC6B", "#9C9C9C"))\(\treat{X}\) causes \(\resp{Y}\)
But \(\treat{X}\) also causes \(\covar{Z}\) which has a causal effect on \(\resp{Y}\)
Should you adjust for \(\covar{Z}\)?
No! In this case \(\covar{Z}\) is a mediator or an intermediate outcome
Adjusting for an intermediate outcome induces post-treatment bias (Montgomery, Nyhan, and Torres 2018)
\(\treat{X}\) causes \(\covar{Z}\)
\(\resp{Y}\) also causes \(\covar{Z}\)
Should you adjust for \(\covar{Z}\)?
No! In this case \(\covar{Z}\) is a collider, a common effect of both \(\treat{X}\) and \(\resp{Y}\)
Colliders can create a fake causal effects or hide real ones
Conditioning on a collider induces endogenous selection bias
|
The Journal of Politics (2020)
Gender Quotas, Women's Representation, and Legislative Diversity
Tiffany D. Barnes; Mirya R. Holman
|
The adoption of gender quotas contributes to the erosion of gendered understandings of “candidate quality” and leads to the expansion of party recruitment networks
Over time, this leads leads to both increases in female representation–the goal of gender quotas–but also serves to increase legislative diversity by changing candidate recruitment
Authors theorize about a causal mechanism, a complex process by which things influence one another both directly and indirectly to produce some outcome
They test their argument on subnational data from Argentina between 2004 and 2018
Download the replication data for the article from the JOP dataverse archive
If you installed git when you were installing R and RStudio, you can just type git pull in the terminal window in RStudio from within the course’s R project to get the data and R code
If you haven’t installed git, you should install git because it will allow you to automatically sync your local files with updates I make to the course repository instead of having to download them from Canvas (i.e., for problem sets, data, code, and lecture slides)
# Set Session Options
options(
digits = 6, # Significant figures output
scipen = 999, # Disable scientific notation
repos = getOption("repos")["CRAN"], # Install packages from CRAN
knitr.kable.NA = '', # Set NA values to blank when making tables
brms.backend = "cmdstanr", # Use cmdstanr as a backend for {brms}
modelsummary_get = "broom" # Custom tiody method for modelsummary
)
# Load the necessary libraries
pacman::p_load(
"sjlabelled", # Package for working with labeled data
"tidyverse", # Suite of packages for data management
"haven", # Reading in data from proprietary software (Stata, SPSS, SAS)
"brms", # Bayesian regression models with Stan
"tidybayes", # Functions for wrangling posteriors tidy-style
"modelsummary", # Package for making tables of model output
"kableExtra", # Package for customizing html and latex tables
"patchwork", # Combining multiple plots into one
"future", # Package for parallel computation
install = FALSE
)Assuming you’re in the course’s project directory, you should be able to load the custom functions we’ll use here with the following code
# Source the helper functions, assumes you're working in the course's R project
.helpers <- map(
.x = list.files(
path = "functions/",
pattern = ".*R",
full.names = TRUE
),
.f = ~ source(.x)
)We need to use the read_dta function from haven since the replication data is in Stata’s proprietary file format
# Load the replication data
jop_data <- read_dta("data/Barnes_and_Holman_JOP2020.dta")We need to use the read_dta function from haven since the replication data is in Stata’s proprietary file format
# Load the replication data
jop_data <- read_dta("data/Barnes_and_Holman_JOP2020.dta")
# Extract the data for the models
model_data <- jop_data %>%
# Select will return a subset with just the specified variables
select(chamber_year:senate, diversity_person:logdm) %>%
# Get the time index
mutate(
# Apply value labels from chamber year
chamber_year = factor(
chamber_year,
levels = get_values(chamber_year),
labels = get_labels(chamber_year)
),
# Get the year from chamber_year
year = str_extract(chamber_year, "20[0-9][0-9]") %>% as.integer()
)\[ \begin{align*} \resp{y}_{\obs{i}} &\sim \mathcal{N(\mean{\mu}_{\obs{i}}, \sigma^{2})}\\ \mean{\mu}_{\obs{i}} =& \alpha + \beta_{1} \mathrm{Female~Representation} + \beta_{2}\mathrm{Time~Since~Quota} + \\ & \quad \beta_{3} \mathrm{Gender-Related~Development} + \beta_{4}\mathrm{Unemployment} + \\ & \quad \beta_{5}\mathrm{Log~District~Magnitude} + \beta_{6}\mathrm{Senate~Chamber} + \sigma\\ \end{align*} \]
We need to start by building the model formulas and to save time we’ll use a custom function formula_builder that we loaded earlier
# Character vector of response variables for each model in table 1 on pp. 1279
responses <- c(
"prof_diversity",
"diversity_person",
"prof_diversity_f",
"diversity_person_f",
"prof_diversity_m",
"diversity_person_m"
)We need to start by building the model formulas and to save time we’ll use a custom function formula_builder that we loaded earlier
# Character vector of response variables for each model in table 1 on pp. 1279
responses <- c(
"prof_diversity",
"diversity_person",
"prof_diversity_f",
"diversity_person_f",
"prof_diversity_m",
"diversity_person_m"
)
# Formula for the LHS which are the same across models in table 1
rhs_form <- "~female + quotayear + logdm + senate + unemployment + gdi"We need to start by building the model formulas and to save time we’ll use a custom function formula_builder that we loaded earlier
# Character vector of response variables for each model in table 1 on pp. 1279
responses <- c(
"prof_diversity",
"diversity_person",
"prof_diversity_f",
"diversity_person_f",
"prof_diversity_m",
"diversity_person_m"
)
# Formula for the LHS which are the same across models in table 1
rhs_form <- "~female + quotayear + logdm + senate + unemployment + gdi"
# Build the list of formulas
model_forms <- formula_builder(lhs = responses, rhs = rhs_form)We need to specify priors on the model parameters (more on this in later classes) and its usually a good idea to save each model to local storage so you don’t have to run things again
# Specify the priors object to pass to brm
model_priors <- prior(normal(0, 1.5), class = "b") +
prior(normal(mean(Y), 2*sd(Y)), class = "Intercept") +
prior(exponential(1/sd(Y)), class = "sigma")
# Build a list of file paths to save the models to
model_paths <- str_c(
"models/Barnes-and-Holman-2020/gaussian_",
responses,
"_full"
)Since I’m lazy, we’re just going to fit each model iteratively using the map function from purrr. Basically, for each index in the list object model_forms, we fit the model and the result is a list of model objects of the same length
# Fit each of the models (6 chains, 5k iterations)
bayes_gaussian_fits <- map(
.x = seq_along(model_forms),
.f = ~ brm(
formula = bf(model_forms[[.x]], decomp = "QR"),
family = brmsfamily("gaussian", link = "identity"),
prior = model_priors,
data = model_data,
cores = 6, # Adjust chains and cores based on your computer's hardware
chains = 6, # Should be at least 4
iter = 5000,
warmup = 3000,
refresh = 1000,
control = list(max_treedepth = 12),
save_pars = save_pars(all = TRUE),
seed = 12345,
backend = "cmdstanr",
file = model_paths[.x]
)
)The looking code is equivalent to that on the previous slide but has the disadvantage of being much less elegant
# Initialize a list ot store the models in
bayes_gaussian_fits <- list()
# Fit each of the models (6 chains, 5k iterations)
for (i in seq_along(model_forms)) {
bayes_gaussian_fits[[i]] <- brm(
formula = bf(model_forms[[i]], decomp = "QR"),
family = brmsfamily("gaussian", link = "identity"),
prior = model_priors,
data = model_data,
cores = 6,
chains = 6,
iter = 5000,
warmup = 3000,
refresh = 1000,
control = list(max_treedepth = 12),
save_pars = save_pars(all = TRUE),
seed = 12345,
backend = "cmdstanr",
file = model_paths[i]
)
}All text and images in this course are made available for public non-commercial use under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.
All R, HTML, and CSS code is provided for public use under a BSD 3-Clause License.